pruning algorithm
Network Pruning via Transformable Architecture Search
Network pruning reduces the computation costs of an over-parameterized network without performance damage. Prevailing pruning algorithms pre-define the width and depth of the pruned networks, and then transfer parameters from the unpruned network to pruned networks. To break the structure limitation of the pruned networks, we propose to apply neural architecture search to search directly for a network with flexible channel and layer sizes. The number of the channels/layers is learned by minimizing the loss of the pruned networks. The feature map of the pruned network is an aggregation of K feature map fragments (generated by K networks of different sizes), which are sampled based on the probability distribution.
Pruning neural networks without any data by iteratively conserving synaptic flow
Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
positive scores (7 7 7 6) and that all the reviewers appreciated the paper for the following: (i) theoretical contributions
We thank all the reviewers for their time and effort in providing feedback. For clarity, we would like to reiterate the goal and motivation of the paper. We address the individual concerns below. We thank R3 for pointing out the typo. Thus, the approximated network achieved 97.17% test set accuracy with On the other hand, one of our networks resulting from edge-popup achieved a 97.53% test set accuracy by retaining We would again like to thank the reviewers for the positive reviews.
How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective
Network pruning is a commonly used measure to alleviate the storage and computational burden of deep neural networks. However, the fundamental limit of network pruning is still lacking. To close the gap, in this work we'll take a first-principles approach, i.e. we'll directly impose the sparsity constraint on the loss function and leverage the framework of statistical dimension in convex geometry, thus enabling us to characterize the sharp phase transition point, which can be regarded as the fundamental limit of the pruning ratio. Through this limit, we're able to identify two key factors that determine the pruning ratio limit, namely, weight magnitude and network sharpness .
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
f7ede9414083fceab9e63d9100a80b36-Supplemental-Conference.pdf
Part of this work was done while Gintare Karolina Dziugaite and Daniel M. Roy We complement the description of experimental methods in Section 2 with additional details. We use all 10,000 original test images as our test set. We use all 50,000 original test images as our test set. We follow the method that Paul et al. C Size-Reduction Does Not Explain Pruning's Benefits to Generalization The algorithm otherwise behaves like learning rate rewinding.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Utah (0.04)
- North America > Canada > Quebec (0.04)
- North America > United States > Wisconsin (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Pruning neural networks without any data by iteratively conserving synaptic flow
Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data?
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)